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AMENDMENTS TO THE CLAIMS: 

This listing of claims will replace all prior versions, and listings, of claims in the 
application: 
Listing of Claims: 

1. (cuiTently amended) A method for predicting a polyadenylation site comprising: 
inputting a plurality of RNA transcript sequences or s e quences dorviod form RNA 

transcript Gcqu e nc e s , wherein at least one sequence has its poly A or poly T tract 
sequence; 

searching for a polyadenylation site, wherein the polyadenylation is an adenine 
rich region at the 31 end of the sequence or a thymine rich region at the beginning 5' end 
of the sequence; 

detecting the presence of polyadenylation signals neighboring the polyadenylation 
site by scanning analyzing [[thel] EST or RNA sequences or their corresponding genomic 
DNA sequences. 

2. (currently amended) The method of Claim 1 wherein the step of searching for a 
polyadenylation site comprising comprises scanning analyzing the sequences for adenine 
rich region at the 3>nd of the sequence or a thymine rich region at the b e ginning 5' end 
of the sequence. 
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3. (original) The method of Claim 2 wherein the adenine rich region comprises 
adenine in at least 50% of the region and the thymine rich region comprises thymine in at 
least 50% of the region. 

4. (original) The method of Claim 2 wherein the adenine rich region comprises 
adenine in at least 60% of the region and the thymine rich region comprises thymine in at 
least 60% of the region. 

5. (original) The method of Claim 2 wherein the adenine rich region comprises 
adenine in at least 70% of the region and the thymine rich region comprises thymine in at 
least 70% of the region. 

6. (original) The method of Claim 2 wherein the adenine rich region comprises 
adenine in at least 80% of the region and the thymine rich region comprises thymine in at 
least 80% of the region, 

7. (currently amended) The method of Claim 1 wherein a heuristic score n A / (n A + 
0.5*(max(nR-20,0))) is used for detecting adenine rich or thymine rich region; wherein n A 
is the number of adenines or thymines in the block, and nR is the number of bases afte* 
downstream of the block of adenines or thymines, thymine to the end of the sequence. 
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8. (currently amended) A method for detecting polyadenylation signal in a sequence 
with a polyadenylation site comprising searching for a polyadenylarion signal hexamer in 
the sequence before 5' of the polyadenylation site . 

9. (currently amended) The method of Claim 8 wherein the searching comprises 
evaluating the probability that there is a polyadenylation [[site]] signal : Pr(h=k|x) for 
k=6,7,..-tN, wherein the sequence before 5' of the polyadenylation site is x=(Xi, X2,..., 
x N ) and where xn is the most 3 '-most base boforo upstream of the polyadenylation site. 

10. (original) The method of Claim 9 wherein: Pr(h=k|x) = Pr(x|h=:k) Pr(h=k)/Pr(x). 

1 1. (original) The method of Claim 10 wherein Pr(h=k|x) = Pr (x*.s> .,x fc |h=k) 
Pr(h=k)/Pr(xk-5, - - - >x k ) and wherein Pr(h=k) is the probability that the polyadenylation 
hexamer is located at position k in the sequence, at a distance (N-k) from the 
polyadenylation site, Pr (xi^ .^^c k |h=k) is the probability of observing the hexamer (x k . 5 » 
...,x k ) given that it is a polyadenylation signal and Pr (x k -s, .. .aMW is the probability of 
observing the hexamer given that it is not from a polyadenylation signal. 

12. (currently amended) The method of Claim 11 wherein the step of detecting 
comprises using a gamma function to produce a density distribution which places the 
majority of its weight on the positions located 5 to 25 bases distant form the 
polyadenylation site. 
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13. (original) The method of Claim 12 wherein Pr(x k -s, • - -Aptfk). the probability of 
observing the hexamer given that it is not from a polyadenylation signal, is modeled 
using a second-order Markov model trained on data collected from human 3'UTRs. 

14. (original) The method of Claim 13 wherein Pr (x k _ 5 > - . .^k|h^k) = Pr(x k _ 3 ) Pr(x k - 
4 |x fc . 5 ) Pr(x M |x k . 5> x M ) Pr(x k . 2 tx w ,x k . 3 ) Pr(x k _i|x k . 3 ,Xk-2) Pr(x k |x k _ 2 ^ k -i) wherein the first 
term is a zero-order Markovian probability, the second is a first-order Markovian 
probability and the remaining four terms are second-order Markovian probabilities. 

15. (original) The method of Claim 14 wherein, for a k^ -order Markov model, the 
probability of base b following a word w of length k is estimated by the frequency of the 
concatenated word (wb) divided by the frequency of the word w, where frequencies are 
computed from the training dataset of 3'LJTR sequences. 

16. (original) The method of Claim 15 wherein, for the case k = 0 (a zero-order 
Markovian model), the probability of base b is estimated by its frequency in the dataset 
divided by the size of the dataset. 

17. (currently amended) A computer readable medium comprising computer- 
executable instructions for performing the method comprising: 

inputting a plurality of RNA transcript sequences or sequ e ncer dorvied form RNA 
t ranscript sequences , wherein at least one sequence has its poly A or poly T tract 
sequence; 
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searching for a polyadenylation site, wherein the polyadenylation is an adenine 
rich region at the y end of the sequence or a thymine rich region at the beginning jTend 
of the sequence; 

detecting the presence of polyadenylation signals neighboring the polyadenylation 
site by s ca nning analyzing [[the]] EST or RNA sequences or their corresponding genomic 
DNA sequences. 

18. (currently amended) The computer readable medium of Claim 17 wherein the step 
of searching for a polyadenylation site comprising comprises scanning analyzing the 
sequences for adenine rich region at the Tend of the sequence or a thymine rich region at 
the beginning 5 7 end of the sequence. 

19. (original) The computer readable medium of Claim 18 wherein the adenine rich 
region comprises adenine in at least 50% of the region and the thymine rich region 
comprises thymine in at least 50% of the region. 

20. (original) The computer readable medium of Claim 19 wherein the adenine rich 
region comprises adenine in at least 60% of the region and the thymine rich region 
comprises thymine in at least 60% of the region. 

21. (original) The computer readable medium of Claim 20 wherein the adenine rich 
region comprises adenine in at least 70% of the region and the thymine rich region 
comprises thymine in at least 70% of the region. 
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22. (original) The computer readable medium of Claim 21 wherein the adenine rich 
region comprises adenine in at least 80% of the region and the thymine rich region 
comprises thymine in at least 80% of the region. 

23. (currently amended) The computer readable medium of Claim 17 wherein a 
heuristic score n A / (n A + 0.5*(max(n R -20,0))) is used for detecting adenine rich or 
thymine rich region; wherein n A is the number of adenines or thymines in the block, and 
n R is the number of bases [[after]] downstream of the block of adenines or thymine 
thymines to the end of the sequence. 

24. (currently amended) A computer readable medium comprising computer- 
executable instructions for performing the method comprising: searching for a 
polyadenylation signal hexamer in the sequence before 5' of the polyadenylation site. 

25. (currently amended) The computer readable medium of Claim 24 wherein the 
searching comprises evaluating the probability that there is a polyadenylation [[site]] 
signal : Pr(h=k|x) for k=6,7,... t N > wherein the sequence befete 51of the polyadenylation 

site is x~(x, > x 2 x N ) and where x N is the 3'-most base before upstreamof the 

polyadenylation site. 

26. (original) The computer readable medium of Claim 25 wherein: Pr(h=k|x) = 
Pr(x|h=k) Pr<h=k)/Pr(x). 
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27. (original) The computer readable medium of Claim 26 wherein: Pr(h=k|x) = Pr 
(xic.s, . . .,x k |h=k) Pr(h^k)/Pr(x k . 5 , . . . ^0 and wherein Pr(h=k) is the probability that the 
polyadenylation hexamer is located at position k in the sequence, at a distance (N-k) from 
the polyadenylation $ite, Pr(x k . 5 , ... Ac|h=k) is the probability of observing the hexamer 
(x k ^, . . . ,Xk) given that it is a polyadenylation signal and Pr(xk-5, • -»Xk|h^k) is the 
probability of observing the hexamer given that it is not from the polyadenylation signal. 

28. (currently amended) The computer readable medium of Claim 27 wherein the 
step of detecting comprises using a gamma function to produce a density distribution 
which places the majority of its weight on the positions located 5 to 25 bases distant from 
the polyadenylation site. 

29. (original) The computer readable medium of Claim 28 wherein Pr(x k _ 5 , ' 
. . . ^jh^k), the probability of observing the hexamer given that it is not from a 
polyadenylation signal, is modeled using a second-order Markov model trained on data 
collected from human 3'UTRs. 

30. (original) The computer readable medium of Claim 29 wherein Pr(Xk-5, 

. . .,Xk|h^k) = Pr(x k _ 5 ) Pr(x k ^|x k . 5 ) Pr(x k . 3 !x k - 5l x k ^) Pr(x k . 2 |x k ^x k . 3 ) Pr(x k _i|x k . 3 *x k _ 2 ) Pr(Xklx k . 
2»x k _i) 7 wherein the first term is a zero-order Markovian probability, the second is a first- 
order Markovian probability and the remaining four terms are second-order Markovian 
probabilities. 
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31. (original) The computer readable medium of Claim 30 wherein, for a k* -order 
Markov model, the probability of base b following a word w of length k is estimated by 
the frequency of the concatenated word (wb) divided by the frequency of the word w, 
where frequencies are computed from the training dataset of 3'UTR sequences. 

32. (original) The computer readable medium of Claim 31 wherein, for the case k=0 
(a zero-order Markovian model), the probability of base b is estimated by its frequency in 
the dataset divided by the size of the dataset. 

33. (currently amended) A system comprising: a processor; and a memory coupled 
with the processor, the memory storing a plurality of machine instructions that cause the 
processor to perform logical steps of the method comprising: 

inputting a plurality of RNA transcript sequences or Expresse d Sequence Tags 
(EST) ooqucncGQ dorvicd form RNA transcript ccquencefl , wherein at least one sequence 
has its poly A or poly T tract sequence; 

searching for a polyadenylation site, wherein the polyadenylation is an adenine 
rich region at the 31 end of the sequence or a thymine rich region at the beginning 5* end 
of the sequence; 

detecting the presence of polyadenylation signals neighboring the polyadenylation 
site by scanning analyzing [[the]] EST or RNA sequences or their corresponding genomic 
DNA sequences. 



PWS 1 W14 ^ RCVD AT 2O4/20D8 1:56:51 PM [Eastern Standard Time] * SVR:US>T0^XRF-1/13 f DNlS:2738300 * CSID:4087315392 f DURATION (mm^s):0344 



Fgb-24-06 11; 01 am •From-Affymetrix, Inc. 408 731 5392 T-608 P. 011/014 F-114 

Application No.: 10/028,416 

34. (cunently amended) The system of Claim 33 wherein the step of searching for a 
polyadenylation site comprising comprises scanning analyzing the sequences for adenine 
rich region at the Hend of the sequence or a thymine rich region at the b e ginning 5* end 
of the sequence. 

35. (original) The system of Claim 34 wherein the adenine rich region comprises 
adenine in at least 50% of the region and the thymine rich region comprises thymine in at 
least 50% of the region. 

36. (original) The system of Claim 35 wherein the adenine rich region comprises 
adenine in at least 60% of the region and the thymine rich region comprises thymine in at 
least 60% of the region. 

37. (original) The system of Claim 36 wherein the adenine rich region comprises 
adenine in at least 70% of the region and the thymine rich region comprises thymine in at 
least 70% of the region. 

38. (original) The system of Claim 37 wherein the adenine rich region comprises 
adenine in at least 80% of the region and the thymine rich region comprises thymine in at 
least 80% of the region. 

39. (cunently amended) The system of Claim 33 wherein a heuristic score n A / (n A + 
0.5*(max(n R -20,0))) is used for detecting adenine rich or thymine rich region; wherein: 
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n A is the number of adenines or thymines in the block, and n R is the number of bases 
[[after]] downstream of the block of adenines or thymines to the end of the sequence. 

40. (currently amended) A system comprising a processor, and a memory coupled 
with the processor, the memory storing a plurality of machine instructions that cause the 
processor to perform logical steps of the method for detecting polyadenylation signal in a 
sequence with a polyadenylation site comprising: searching for a polyadenylation signal 
hexamer in the sequence before 5' of the polyadenylation site. 

41, (currently amended) The system of Claim 40 wherein the searching comprises 
evaluating the probability that there is a polyadenylation [[site]] signal: Pr(h=k|x) for 
k=*,7,. . . J*, wherein the sequence befefe 51of the polyadenylation site is x=(*i , x 2 ,- . . • 
x N ) and where x N is the 3* -most base befor e upstream of the polyadenylation site. 

42. (original) The system of Claim 41 wherein: Pr(h=k|x) = Pr(x|h=k) Pr(h=k)/Pr(x). 

43, (original) The system of Claim 42 wherein Pr(h=klx) =Pr(x*-5, ■ - .,Xk|h=k) 
Pr(h=k)/Pr(x k _ 5 , . . . ^0 and wherein Pr(h=k) is the probability that the polyadenylation 
hexamer is located at position k in the sequence, at a distance (N-k) from the 
polyadenylation site, Pr(x k -5, ...>Xk|h-k) is the probability of observing the hexamer (x k - 5 > 
...,Xk) given that it is a polyadenylation signal andPr(x k . 5 , ...,x k M0 is the probability of 
observing the hexamer given that it is not from a polyadenylation signal. 
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44. (currently amended) The system of Claim 43 wherein the step of detecting 
comprises using a gamma function to produce a density distribution_w hich places the 
majority of its weight on the positions located 5 to 25 bases distant from the 
polyadenylation site. 

45. (original) The system of Claim 44 wherein Pr(x k . 5 , - - - ^npfit). the probability of 
observing the hexamer given that it is not from a polyadenylation signal, is modeled 
using a second-order Markov model trained on data collected from human 3'UTRs, 

46. (original) The system of Claim 45 wherein Pr(x k _ 5 , . - . ^^k^Pr^) Pr(x k .4x k - 
5 ) Pr(x k . 3 ixk-s,x M ) Pr^lx^Xk-s) Pr(x k -i|x k -3,x k _ 2 ) Pr(x k |x k . 2 ,x k -i), wherein the first term is 
a zero-order Markovian probability, the second is a first-order Markovian probability and 
the remaining four terms are second-order Markovian probabilities. 

47. (original) The system of Claim 46 wherein, for a k* -order Markov model, the 
probability of base b following a word w of length k is estimated by the frequency of the 
concatenated word (wb) divided by the frequency of the word w, where frequencies are 
computed from the training dataset of 3'UTR sequences. 

48. (original) The system of Claim 47 wherein, for the case k=0 (a zero-order 
Markovian model), the probability of base b is estimated by its frequency in the dataset 
divided by the size of the dataset. 
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